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Abstract 

Multivariate normal mixtures provide a flexible model for high-dimensional data. 
They are widely used in statistical genetics, statistical finance, and other disci- 
plines. Due to the unboundedness of the likelihood function, classical likelihood- 
based methods, which may have nice practical properties, are inconsistent. In this 
paper, we recommend a penalized likelihood method for estimating the mixing dis- 
tribution. We show that the maximum penalized likelihood estimator is strongly 
consistent when the number of components has a known upper bound. We also ex- 
plore a convenient EM-algorithm for computing the maximum penalized likelihood 
estimator. Extensive simulations are conducted to explore the effectiveness and the 
practical limitations of both the new method and the ratified maximum likelihood 
estimators. Guidelines are provided based on the simulation results. 

Key words: Multivariate normal mixture, Penalized maximum likelihood 
estimator. Strong consistency. 
PACS: 02.50.-r 



1 Introduction 



In the past few decades, there has been an exploding volume of literature on 
mixture models [221 [ISl [El E] • Various mixture distributions including normal 
mixtures are used in a wide variety of situations. Schork et al. [12] reviewed 
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the applications of mixture models in human genetics and Tadesse et al. [20] 
used a normal mixture model for clustering analysis. Application examples 
can be found in [3 [121 [IS] and [T] . 



Finite mixtures of multivariate normals have also drawn substantial atten- 
tion recently. Lindsay and Basak [13] devised a system of moment equations 
and a fast algorithm to estimate the parameters of multivariate normal mix- 
ture distributions under an equal-covariance-matrix assumption. However the 
equality assumption is crucial, and failing this condition leads to a substantial 
loss in the accuracy of the fit [15]. Unequal- variance normal mixture models 
have an ill effect on the likelihood function [3]. Placing a positive lower bound 
on the component variances helps, but the resulting statistical procedure can 
be awkward because it is not continuous in the data. Placing a positive lower 
bound on the ratio of the component variances is better. In the univariate case 
the resulting constrained maximum likelihood estimator is consistent for both 
constant and shrinking lower bounds [Hl[2I]- Though consistency is yet to be 
proved, Ingrassia [U] applied the constrained method to multivariate observa- 
tions. Ray and Lindsay [17] found that in contrast to the univariate case, the 
multivariate normal mixture density can have more modes than the number of 
components. Inference on multivariate normal mixture models is hence more 
difficult. 



In this paper, we investigate a penalized likelihood method for estimating the 
mixing distribution. The penalized likelihood estimations form a population 
class of methods, see [3, H]- When the number of components has a known 
upper bound, the maximum penalized likelihood estimator (PMLE) is found 
to be strongly consistent. An EM-algorithm is developed and extensive simu- 
lations are conducted. Although after some ratification, the usual maximum 
likelihood estimators and the PMLE work similarly after the removal of de- 
generating local maxima in the univariate case [2], the PMLE is advantageous 
for multivariate normal mixture models. 



The paper is organized as follows. In Section 2, the penalized likelihood method 
is introduced. Two theorems on strong consistency are presented with the 
proofs deferred to the Appendix. The EM-algorithm for solving the maximiza- 
tion problem for the penalized likelihood function is given. Section 3 contains 
the simulation results. 
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2 Penalized likelihood method 



2. 1 Consistency of the PMLE 

Let <y9(x; /i, S) be the multivariate normal density with {d x 1) mean vector /i 
and dx d covariance matrix S, i.e., 

^(x; /i, S) = {27r|S|}-^/2 exp{-i(x - /i)-S-i(x - /i)}. 

A (i-dimensional random vector X has a multivariate finite normal mixture 
distribution of order p if its density function is given by 

/(x; G) = 7ri(y9(x; fii, Si) + 7r2V2(x; /i2, ^2) H TTp^pi^] ftp, Sp) (1) 

where G is the mixing distribution assigning probability ttj to parameter set 
(/ij, Sj) of the jth kernel density V9(x; /i^, Ej). 

Let Xi, X2, . . . , x„ be a random sample from ([T]). Then 

/„(G)=X:iog/(x„G) 

i=l 

is the log-likelihood function. Even if > for all j, ln{G) is unbounded at 
/ii = Xi when |Ei| gets arbitrarily small. The penalized log-likelihood function 
is of the form 

pUG)=ln{G)+Pn{G) 

where Pn{G) is the penalty depending on the mixing distribution G and the 
sample size n. Let G„ be the mixing distribution in the parameter space at 
which pln{G) attains its maximum. We call Gn the penalized maximum like- 
lihood estimator (PMLE). 

We choose a penalty function such that: 

CI. J9„(G)=E?=lPn(S,), 

C2. At any fixed G such that > for all j = 1, 2, . . . ,p, we have Pn{G) = 
o{n), and sup(jmax{0,p„(G)} = o{n). 

In addition, Pn{G) is differentiable with respect to G and as n ^ cxd, 
p'n{G) = o{y/n) at any fixed G such that > for all j = 1,2, ... ,p. 
Here we treat G as a vector of parameters contained in the mixing dis- 
tribution G. 

C3. For large enough n, PnC^) < 4(logn)^ log when |S| is smaller than 
^^-2d some c > 0. 

These conditions are quite fiexible and functions satisfying these conditions 
can be easily constructed. A class of such functions will be given in the simu- 
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lation section. Condition CI simplifies the numerical computation. Condition 
C2 limits the effect of the penalty. The key condition is C3: it counters the 
damaging effect of a degenerate component covariance matrix. The order of 
the penalty size is well calibrated as will be seen in the proof, yet the exact 
value of the constant 4 is not important. The penalty function can also be 
viewed as a prior function via Bayesian analysis. 

Theorem 1 Assume that the true density function 

PO 

f{x; Go) = J2 TToMx; /xqj, Soj) 
i=i 

satisfies TToj > 0, \T.oj\ > 0, and (/ioj, Soj) 7^ (/lofe, ^ok) for all j = 1,2, . . . ,po 
and j 7^ k. 

Assume that the penalty function Pn{G) satisfies C1-C3 and Gn is a mixing 
distribution of order pq satisfying 

pln{Gn) -plniGo) >C> -OO, 

for all n. Then, as n 00, Gn^Go, almost surely. 
The proof is deferred to the Appendix. 

Since pln{Gn) — pln{Go) > 0, the PMLE G is strongly consistent. Because Gn 
and Go have the same order, all elements in Gn converge to those of Go almost 
surely. Furthermore, let 

Sn{G) - ^ — 
be the vector score function at G. Let 

be the matrix of the second derivative of the log-likelihood function. At G = 
Go, the normal mixture model is regular and hence the Fisher information 

/„(Go) = nI{Go) = -E{S',{Go)} = ^[{5„(Go)}^5„(Go 

is positive definite. Using classical asymptotic techniques as in [11] , and under 
condition C2 such that p'n{G) = Op{n^^'^), we have 

G„ - Go = {S',XGo)}-'SniGo) + Op(n-i/2). 

Therefore, G„ is an asymptotically normal and efficient estimator. 
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Theorem 2 Under the same conditions as in TheoremUl as n ^ oo, 

^{Gr.-Go}^N{0,IiGo)) 

in distribution. 

The proof is straightforward and omitted. In practice, we may know only an 
upper bound for pq rather than its exact value. The following theorem deals 
with this situation. 

Theorem 3 Assume the same conditions as in Theorem [H except that the 
order of the finite normal mixture model po is known only to be smaller than 
or equal to p. Let Gn be a mixing distribution of order p satisfying 

pln{Gn) - pln{Go) >C> -OO 

for all n. Then, as n ^ oo, Gn — ^ Gq almost surely. 
The proof is deferred to the Appendix. 



2.2 The EM-algorithm 



We recommend the EM-algorithm due to its simplicity in coding, and its 
guaranteed convergence to some local maximum under very general conditions 
[211 [ISl El- In our simulations, we use a number of initial values to reduce the 
risk of poor local maxima. We also recommend some convenient and effective 
penalty functions for the EM-algorithm. 

Let Zij be the membership indicator variable such that it equals 1 when Xj is 
from the jth component of the normal mixture model, and equals otherwise. 
The complete observation log-likelihood under a normal mixture model is then 
given by 

" ^ f 1 1 1 

^c(<^) = Yl^^^ik |logVrfc - -log|Sfc| - -(Xi - /ifc)^S^^(Xi - /ife) j . 

Given the current mixing distribution 

the EM-algorithm iterates as follows: 
In the E-Step, we compute 



5 



Replacing Zij by tt^J^'''^-* in lc{G), we get 



QiG; = EmC) + p^{G) |xi, . . . , x„, G^™)} 

p n TP " 

= Eaog7r,)E-r'^-^E(iog|s,l)E 



TT, 



(m+1) 
ij 



j=l i=l 
P n 



i=l 



^ j=i i=i 



(m+l) 



This completes the E-step. 

In the M-step, we maximize Q{G]G^'^^) with respect to G to obtain G^'^'^^\ 
We suggest the following penalty functions in practice: 



Pn{G) = -an E {tr(5,STi) + log 



(2) 



i=i 



with Sx being the sample covariance matrix, and tr(-) being the trace function. 
Using this penalty function, Q{G] G^™^) is maximized at G = G^"^^^^ with 



(m+l) 



(m+l) 
^(m+l) 



n 



E 



TT, 



i=l 



(m+l) 
ij 1 

(m+l) 



(m+l) 



(m+l) 
3 



2a„ + nvr • 



(m+l) 



where 



S. 



(m+l) 



E(m+1) / 



1=1 



From a Bayesian point of view, the penalty function ([2]) puts a Wishart dis- 
tribution prior on Ej, and 5*3; is the mode of the prior distribution. Increasing 
the value of implies a stronger conviction on Sx as the possible value of Sj. 

The EM-algorithm iterates between the E-step and the M-step. The penal- 
ized likelihood increases after each iteration. At the same time, the penalized 
likelihood is bounded over the parameter space. Hence, the EM-algorithm con- 
verges to a non-degenerate local maximum. This is the dividing line between 
the penalized likelihood and the ordinary likelihood. In both cases, the EM- 
algorithm may converge to an undesired local maxima starting from a poor 
initial value. In the simulations, we use ten initial values including the true 
value for each data set to control this potential problem. 
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3 Simulation study. 



When computing the MLE the local maxima located by the EM-algorithm 
with degenerate covariance matrices are first removed. The one that attains the 
largest likelihood value among those remaining is then identified as the MLE 
or the ratified MLE of the mixing distribution. Although this approach lacks 
solid theoretical support, it works well for univariate normal mixture models 
[2]. The consistency result for the PMLE for multivariate normal mixture 
models does not guarantee its superiority in practice. Thus, we feel obliged 
to compare the performance of the PMLE with that of the ratified MLE. In 
addition, there is a general shortage of thorough simulation studies in the 
context of multivariate normal mixture models. This paper partially fills that 
knowledge gap. 

We use bias and standard deviation to measure the accuracy of the rati- 
fied MLE and the PMLE. We also record the number of times that the EM- 
algorithm degenerates when the ratified MLE is attempted. For clarity, the 
simulation results are organized into two subsections. 



3.1 Simulation models and settings 



The size of the parameter space for the finite multivariate normal mixture 
model explodes with the dimension. It is difficult to use a few typical specific 
distributions to cover all aspects of this model. We struggled to come up with 
a few particularly important cases. We considered four categories of mixture 
models: two-component bivariate normal mixture models (p = 2, d = 2); 
three-component bivariate normal mixture models = 3, = 2); two- 
component trivariate normal mixture models (p = 2, (i = 3); and three- 
component trivariate normal mixture models (p = 3, = 3). 

In each category, we chose 3x6 models formed by component mean vector 
and covariance matrix configurations. These combinations mimic practical sit- 
uations and make the comparison of the performance of the ratified MLE and 
the PMLE meaningful. 

The covariance matrices in the simulation models are designed to have the 
following general form when d = 2: 



cos 9 — sin 6* 
sin 6 cos 6 



Ai 
A2 



cos 9 sin 9 
— sin 9 cos 9 



By the choices of the eigenvalues Ai, A2, and the orientation angle 9, we obtain 
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various configurations of bivariate normal mixture models. 



The covariance matrices in the simulation models are designed to have the 
following general form when d = 3: 

E = P{a, (3, 7)diag[Ai, A2, XsjP^'ia, (3, 7) 

with 



P(a,/3,7) = 

cos a cos 7 — cos /3 sin a sin 7 — cos P cos 7 sin a — cos a sin 7 sin a sin /3 

cos 7 sin a + cos a cos /3 sin 7 cos a cos p cos 7 — sin a sin 7 — cos a sin /3 
sin P sin 7 cos 7 sin /? cos /3 



that is, a 3 X 3 rotation matrix. For each multivariate normal mixture model, 
we specify the mixing proportion, covariance matrix, and mean vector for each 
component. 



Two-component bivariate normal mixture models. Wc set the compo- 
nent proportions (711,712) = (0.3,0.7). No other cases are considered. 

Due to the invariance property of the multivariate normal distribution, the 
distance between the two mean vectors is the only configuration that can make 
a difference. Thus, we simulated only three pairs of mean vectors representing 
the situation where two component mean vectors are in near, moderate, and 
distant locations as in the following table: 





near 


moderate 


distant 


Component 1 
Component 2 


(0, -1) 
(0, 1) 


(0, -3) 
(0, 3) 


(0, -5) 
(0, 5) 



There are many features in the pair of covariance matrices that may have 
an effect on the performance of the ratified MLE or PMLE. The sizes of the 
eigenvalues are most important in their ratio A2/A1. The angle 6 determines 
the relative orientation between two component densities. Our choices based 
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on these considerations are given in the following table: 





Component 1 


Component 2 




Ai 


A2 


U 


Ai 


A2 


1 

1 


■1 
i 


i 


(J 


-I 
i 


1 (J 


9 


1 


5 





1 


1 


3 


1 


5 


7r/4 


1 


1 


4 


1 


5 


7r/2 


1 


1 


5 


1 


5 


7r/4 


1 


5 


6 


1 


5 


7r/2 


1 


5 



Three-component bivariate normal mixture models. We set the com- 
ponent proportions (tti, 7r2, tts) = (.15, .35, .50). The three mean vectors may 
form a straight line, an acute triangle, or an obtuse triangle. We select three 
representative ones as follows: 





straight 


acute 


obtuse 


Component 1 


(0, -2) 


(0, 


-2) 


(0. 


-2) 


Component 2 


(0, 0) 


(3, 


0) 


(1, 


0) 


Component 3 


(0, 2) 


(0, 


2) 


(0, 


2) 



We select six triplets of covariance matrices as follows: 





Component 1 


Component 2 


Component 3 




Ai 


A2 





Ai 


A2 





Ai 


A2 


1 


1 


1 





1 


1 





1 


1 


2 


1 


1 





1 


1 





1 


5 


3 


1 


1 





1 


5 





1 


5 7r/4 


4 


1 


1 





1 


5 





1 


5 7r/2 


5 


1 


5 





1 


5 


7r/4 


1 


5 -7r/4 


6 


1 


5 





1 


5 


7r/4 


1 


5 -n/2 



Two-component trivariate normal mixture models. We again let (tti, 1:2) = 
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(0.3, 0.7). At the same time, only the distance between the two mean vectors 
matters. The two mean vectors are chosen to be: 





near 


moderate 


distant 


Component 1 
Component 2 


(0, 0, -1) 
(0,0, 1) 


(0, 0, -3) 
(0,0, 3) 


(0, 0, -5) 
(0,0, 5) 



The covariance matrix pairs are chosen as follows: 





Component 1 


Component 2 




(Ai, A2, A3) 


(a,/3,7) 


(Ai, A2, A3) 


(a,/3,7) 


1 


(1, 1, 1) 


(0, 0, 0) 


(1, 1, 1) 


(0, 0, 0) 


2 


(1, 1, 1) 


(0, 0, 0) 


(1, 3, 10) 


(0, 0, 0) 


3 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(0, 0, 0) 


4 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(-7r,7r,7r)/3 


5 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(tt, -7r,7r)/3 


6 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(7r,7r, -7r)/3 



Three-component trivariate normal mixture models. We let the com- 
ponent proportions (711,712,713) be (.15, .35, .50). Recall that any three points 
fall into one plane. Thus, the invariance property of the normal distribution 
allows us to set the first entry of the mean vector to 0: 





straight 


acute 


obtuse 


Component 1 
Component 2 
Component 3 


(0, 0, -2) 
(0,0, 0) 
(0,0, 2) 


(0, 0, -2) 
(0,3, 0) 
(0,0, 2) 


(0, 0, -2) 
(0, 1, 0) 
(0,0, 2) 
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The covariance matrix triplets are chosen as follows: 





Component 1 


Component 2 




Component 3 




( Ai . Ao . Aq ) 




( Ai . Ao . A'i ) 


{a, 13,1) 


(Ai, 


A2, A3) 




1 


(1, 1, 1) 


(0, 0, 0) 


(1, 1, 1) 


(0, 0, 0) 


(1 

v 


1, 1) 


(0, 0, 0) 


2 


(1, 1, 1) 


(0, 0, 0) 


(1, 1, 1) 


(0, 0, 0) 


(1, 


3, 10) 


(0, 0, 0) 


3 


(1, 1, 1) 


(0, 0, 0) 


(1, 3, 10) 


(0, 0, 0) 


(1, 


3, 10) 


(-7r,7r,7r)/3 


4 


(1, 1, 1) 


(0, 0, 0) 


(1, 3, 10) 


(0, 0, 0) 


(1, 


3, 10) 


(tt, -7r,7r)/3 


5 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(-7r,7r,7r)/3 


(1, 


3, 10) 


(tt, -7r,7r)/3 


6 


(1, 3, 10) 


(0, 0, 0) 


(1, 3, 10) 


(tt, -7r,7r)/3 


(1, 


3, 10) 


(7r,7r, -7r)/3 



We let n = 200 for the two-component bivariate mixtures and n = 300 for the 
other mixtures to ensure a reasonable estimation of the mixing distribution. 
We generate 1000 data sets for each model. 

We have presented four categories of finite normal mixture models. For ease 
of reference we use, for example, 1.1.2 to refer to the model from Category 
1 with mean vector configuration 1 and covariance matrix configuration 2. 
Even though there are many more mixing distribution configurations for which 
simulation studies are needed, there is a limit to how much one paper can 
achieve. We do not consider the case where p is unknown. All estimators in 
this case are expected to be poor although the consistency result for the PMLE 
remains true. 

Penalty term and initial values. We compute the ratified MLE and two 
penalized MLEs corresponding to a„ = n^^ and a„ = in ([2]). We call 

these MLE, PMLEl, and PMLE2, respectively. 

The ten initial values are chosen from two groups. The first group of ini- 
tial values includes the true mixing distribution and four others obtained by 
perturbing the component mean vectors of the true mixing distribution. The 
second group of initial values was data-based. We first calculate the sample 
mean vector and the sample covariance matrix. Then we set the mixing pro- 
portions all equal to 1/p and the component covariance matrices all equal 
to the sample covariance matrix. We then apply similar perturbation to the 
sample mean vector to obtain another five sets of initial values. 
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3.2 Simulation results 



Number of Degeneracies. When the EM- algorithm converges to a mix- 
ing distribution with singular component covariance matrices, we say that it 
degenerates. The EM-algorithm for the PMLE does not degenerate which is 
theoretically ensured. Regardless of the quality of the initial value, the corre- 
sponding EM-algorithm always converges to some non-degenerate local max- 
imum. The PMLE is a good estimator if the largest local maximum is a good 
estimator. 

When computing the ratified MLE, the EM-algorithm sometimes converges to 
a degenerate local maximum. We recorded the number of times that the EM- 
algorithm degenerated while computing the ratified MLE in our simulation. 
Since each data set had ten initial values, the number of degenerate outcomes 
is out of 10,000 for each entry. 

For two-component bivariate normal mixture models, it is immediately clear 
that the number of degenerate outcomes increases when the mean vectors are 
more widely separated. The covariance structure is also important. For exam- 
ple, when the eigenvectors of one covariance matrix are rotated by an angle of 
7r/2 (variance configurations 4 and 6), so that the two clusters of observations 
become more mixed, the number of degenerate outcomes declines. This ob- 
servation is somewhat counter-intuitive but can be explained as follows. The 
success of the EM-algorithm is heavily dependent on sensible initial values. 
When the two mean vectors arc close and the components are well mixed, 
different initial values do not matter as much. However, when the two mean 
vectors are distant, the location of the initial mean vectors is crucial. Thus the 
degenerate outcomes were mostly due to the second group of initial values. 

In the other three categories, the above phenomenon persists. That is, the fre- 
quency of degeneracy increases when components are more widely separated. 
In addition, for these categories we observe a higher frequency of degeneracies 
on average. We believe this is because the EM-algorithm is more sensitive to 
the quality of the initial values when the mixture models are more complicated. 

Degeneracy of the EM-algorithm should not be a serious problem for the 

ratified MLE, as long as the non- degenerate outcomes of the algorithm provide 
good estimates. We hence proceed to examine the bias and variance properties 
of the PMLE and the largest non-degenerate local maxima regarded as the 
ratified MLE. 

Bias and Standard Deviation. We compute the element-wise mean bias 
and standard deviation based on 1000 simulated samples from each model. 
We present only a subset of representative outcomes from each category; the 
complete set is available upon request. 
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Two representative outcomes for models 1. 1.1 and 1.2.4 in Category I are 
given in Table [2l There is about a 10% reduction in the standard deviation 
for PMLE2 compared to the ratified MLE or PMLEl for the parameters in 
component 1 of Model 1. 1.1. The same is true for Models 1.1.5 and 1.1.6 (not 
presented). The PMLE2 also has a relatively lower bias in these models. The 
results for the remaining models are comparable to those for 1.2.4: there is 
little appreciable difference between the three estimation methods. 

The biases of all three estimators for estimating /i2 are high under 1. 1.1 and 
1.1.5 in which the two mean vectors are lined up in the fii direction. Due to 
the orientation of the two component covariance matrices, it is hard to tell the 
two mean vectors apart. The biases and standard deviations for estimating (T22 
under 1. 1.1, 1.1.2, . . ., 1.1.6 are also high or relatively high. 

Table [2] about here. 

We present outcomes for two models (II. 1.1, II. 2. 4) in Category II in Tables 
|3] and HI For both models, for the parameters in component 1, there is a 
10% to 20% reduction in the standard deviation for PMLE2 compared to the 
other two estimators. The bias of PMLE2 is also lower. Some reductions in 
components 2 and 3 are also noticed but to varying degrees. In the other 
models, the performance of PMLE2 does not dominate that of the ratified 
MLE or PMLEl. 

Under a straight-line configuration of the component mean vectors, the bias 
for estimating fi2 is relatively high. For a triangle configuration, the roles of /ii 
and H2 are no longer different. This bias problem is not estimator dependent 
although PMLE2 helps slightly. 

The estimation of CT22 again comes with both higher bias and higher standard 
deviation in general. For this category of models, the problem spreads into 
other parts of the covariance matrix. 

Tables |3l H] about here. 

We report simulation results for three models (III. 1.1, III. 2. 4, III. 3. 6) in Cate- 
gory III in Tables O O and [71 We again observe that PMLE2 has smaller bias 
and standard deviation for estimating the parameters in the first component 
where the mixing proportion is small, and in model III. 1.1 where the two mean 
vectors are close. The gain is as much as 30% for (733. 

The gains seem to disappear when the two component mean vectors are far 
from each other. Nevertheless, PMLE2 still appears to be the best estimator 
in terms of both bias and standard deviation. 
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Tables [5], [6][7] about here. 



We report simulation results for three models (IV. 1.1, IV. 2. 4, IV.3.6) in Cat- 
egory IV in Tables [HI El and [101 Again, PMLE2 has the lowest standard devi- 
ations for estimating the parameters in the first component where the mixing 
proportion is small. The comparison is the sharpest in model IV.2.4 for ais. In 
contrast to the models for the other categories, here the superiority of PMLE2 
is widespread. In fact, PMLE2 is superior for parameters in component 2, and 
mixed for parameters in component 3. 

We caution that even the best estimator is not necessarily a good estimator for 
trivariate mixture models. Overall, none of the three estimators does a great 
job at estimating mixing distributions, possibly due to their fundamental 
nature, e.g., small Fisher Information for high-dimension multivariate normal 
mixture models. This problem is expected to disappear with increased sample 
size. 

Tables El MM about here. 

Summary of the simulation results. To conclude, the penalized likelihood 
estimators, both PMLEl and PMLE2, are completely free from degeneracy 
problems. Moreover, PMLE2 has the best general performance in terms of bias 
and standard deviation. This is most obvious when the components are not 
well separated. In applications, it is unnecessary to first judge whether it is safe 
to use the ratified MLE, when a superior PMLE2 is available. Although we do 
not completely dismiss the use of the ratified MLE, it is clearly advantageous 
to use PMLE2 outright. We further caution against the use of high-dimension 
multivariate normal mixture models in practice when the sample size is not 
large. In these situations, even the best performing estimator may not be a 
good estimator. 
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Appendix 

The ordinary likeliliood function is unbounded because when the covariance 
matrix of a kernel density becomes close to singular, the likelihood contribution 
of the observations near its mean vector goes to infinity. Thus, a key step in 
our proof is to assess the number of such observations. In the univariate case, 
Chen et al. [2] obtained the following result: 

Lemma 1: Assume that xi,X2, ■ ■ ■ ,Xn is a random sample from a finite nor- 
mal mixture distribution with density f{x), x E R. Let Fn be the empiri- 
cal distribution function and define M = maxjsup^, /(x), 8}, and 5n(o") = 
—Malog{a) + n~^. Except for a zero-probability event not depending on a, 
we have for all large enough n, 

(a) for a between exp{— 2) and8/{nM), 

sup[F„(/i -a\og{a)) - F^ifi)] < 45„(cr); 

(b) for a between and 8/{nM), 

sup[F„(/i - 0- logo-) - < 2n'^(\ognf. 

The consistency result for the multivariate normal mixture model is built on 
a generalized result. More specifically, the following lemma gives a bound for 
the multivariate normal mixture model: 

Lemma 2: Let xi, X2, ■ ■ ■ ,yin be a random sample from a d-dimensional multi- 
variate normal mixture model withp components such that its density function 
is given by 

p 

/(x. Go) = ^ioV5(x; Hjo, Sjo). 
i=i 

Assume that all S^o are positive definite. For any mean and covariance matrix 
pair (/i, E) such that |S| < exp(— 4(i), except for a zero probability event not 
depending on (/x, S), we have, for n large enough, that 

n 

S) = 5]/{(x,-/i)-S-i(x,-/i) < -(log|S|)2} 

i=l 

<4(log'n)/(|S| <a„) + 8n(5„(|S|)/K< |S|), 

where 

r an = {A/Mdf''n-^\ 
\5nm) = -M\n^'^''\og\'i:\+n-\ 

— 1/2 

and M = max{8, Ag } with Aq being the smallest eigenvalue among those of 
Sjo, (j = l,2,...,p). 
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Proof of Lemma 2: Let < Ai < A2 < ■ • ■ < and (ai, . . . , a^) be the 
eigenvalues and corresponding eigenvectors of unit length of E. We have that 

{x:(x-//rE-i(x-/i)<-(log|E|f} 

= {x:EA7>J(x-/.)|2<-(log|S|)^} 

C{x: |aj(x-//)| < -^A^loglEl, j^l,...,d} 
C{x:|a[(x-/.)|<-/^log|S|}. 

Furthermore, let 

Q = {hr.t = l,2,...} 

be a sequence of unit vectors so that Q forms a dense subset of unit vectors 
in W^. Hence, for any given ai and any bounded subset B E W^, we can find 
a vector b in Q such that they are arbitrarily close so that 

{x e B : |a[(x-/x)| < — ^A^log |E|} C {x e B : |b^(x-/x)| < — ^/^log |E|}. 
Based on this observation, we get 

n 

sup//„(//,E)=supE7{(x,-/xrE-i(x,-/x) < -(log|E|)2} 
<supsupE7{|b^(xi-/i)| < A/2Ai|log|E||}. 

On the other hand, given any non-random unit vector b, b'^Xj, i = 1, 2, . . . , n 
is a random sample from the univariate normal mixture model with density 

p 

/^(x) = E7rjo0(x; b>jo,b^Sjob). 

We remark that since some pairs of (b^/Xjo, b^'Ejob) can be equal, this uni- 
variate mixture distribution can have fewer than p components. This does not 
affect the following derivation. Recall that Aq is the smallest eigenvalue among 
those of Ejo, j — 1-1 . . . ip. Then 

supsup/''(x) < supmax{(b^Ejob)"-^/^, j = 1, . . . ,p} = Aq ^ 

Applying Lemma 1 to the univariate data b^Xj,i = l,...,n, except for a 
zero-event not depending on E, as n — > 00, we have 
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supX:/{|b^(x.-/i)| < ^aIi log|S||} 
i=i 

<4(log'n)/(|E| < Q;„) + 8n5„(|E|)/(Q;„ < |E|). 

The conclusion of the lemma simply claims that the above inequality is true 
over all b e (5 with only a zero-probability-event exception. The zero-probability 
claim remains true because Q is countable. 

Proof of Theorem 1: We give a proof for the case p — 2; the proof for the 
general case is similar. Let F be the parameter space for G and define 

Ti^fGer : |Ei| < IE2I <£o} 

r^^iGer : |Ei| <to,|E2| >£o} 
r3 = r-(riur2) 

where Sq > tq > are two small positive constants to be specified soon. 
The first subspace represents the case where the two components have nearly 
singular covariance matrices. Hence the observations inside the small ellipse 
centered at the mean parameter make a large contribution to the log likelihood 
function. 

Let Kq — £^{log/(X; Go)}. The constants Eq, tq must satisfy the following 
four conditions: 

1: < £0 < exp{-4(i}; 

2: -log£o-(log£o)'<4(ii'o-2); 

3: 16M£y^''(log£o)^ < 1; 
4: 16MdTo(logTo)2 < f^o; 

for some 5o > to be specified. The existence of Eq, Tq is obvious. 

We proceed with the proof in three steps. 

Step 1. For any G e Fi, we show that almost surely, 

sup plniG) -plniGo) -00. 

Define two index sets 

A^{i:{x,- fiiyj:^\x, - /ii) < (log |Ei|)2}, 
B^{i:{x,- ^i2yT.:^\x, - fi2) < (log IE2I)'}, 
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and for any index set S e {1,2, ... ,n}, denote 

UG; 5)=i:iog/(X„ G). 
ies 

We can write /„(G') = A) + l^{G; A^E) + /„(G'; A^E'^), where and 5^= 

are the complement sets of A and S respectively. For any index set 5", denote 
n{S) as its cardinahty. It is easy to see that 

ln{G; A)<n{A)log\Ei\--^, 

ln{G; S) <n(S)log|S2|-i 

Applying Lemma 2 to n{A) and n{B), noting that |Ei| < eo for G in Fi, and 
C3 on the penalty function, we find that 

ln{G; ^)+p„(Ei) < 16dlogn + 8M£o^(log£o)^^ 

ln{G; A'B)+Pn{^2) < IQdlogn + SMet'ilogeoYn. 

The key point underlying the above two inequalities is that they are bounded 
by an arbitrarily small fraction of n. Further, for observations away from ni 
and H2, we have 



UG; A'B') 

< J2 log[7riexp{log|Ei|-^ - -(log|Ei|)2} + 7r2exp{log|E2r^ - -(log|E2 

< E {-^log£o-^(log£o)n 

ieA'=B'= ^ ^ 
<n{Ko-2) 

The last line in the above derivation is obtained by choosing a small enough eg 
as specified earlier. Combining these inequalities, we get pln{G) < n{Ko — 1), 
and hence almost surely 

sup pln{G) — pln{Go) < —n + 16cilogn. 

That is, 

supp/„(G') -pln{Go) -oo 
almost surely which completes the first step. 
Step 2. For G e F2, we also show that almost surely 

SUppln{G) -pln{Go) -00. 



20 



Recall that for each i E A, (xj — yUi)''Si^(xj — yUi) is bounded by (logSi)^. 
Hence, it is easy to verify that for i E A, 

V9(xi;/ii,Si) < |Si|"^/^exp{-i(xi - /ii)^S];^(xi - /ii)}. 

For i ^ A, 

Lp{-Ki] /ii, El) < exp{-i(xi - /ii)^S];^(xi - /ii)}. 
Therefore, letting (not a density itself) 

5((x; G) = Til exp{-^(x - fii)'^J:^\yi - A^i)} + 7r2V2(x; fi2, ^2), 

we have 

log/(x,; G) < log (?(x,; G) + I{i G A) log iSir^/l 

Hence, we get 

n 

ln{G- A) <n{A)\og\T.^\-^^ + Y.9i?^^^G). 

i=l 



It is obvious that for any G G (a) Eq {\ogg{X]G) / f{X]GQ)} < by 
Jensen's inequality and the fact that the integration of ^^(x, G) is less than 1; 
(b) (7(x; G) < Eq^ by the definition of Hence for each given G e by the 
law of large numbers, 

1 " 

- ^ log{^?(X,; Go)} ^ E{g{X; G)/f{X- Go)} < 0. 

For each fixed x, we can extend the definition of g{'x; G) in G onto the com- 
pacted r2 while maintaining properties (a) and (b) and its continuity in G. 
Thus, a classical technique as in [23] can be readily employed to show that as 
n — > 00, 

-p(-Elogf|P^)|--5(ro)<0 (3) 
Gera \nf^^ \f{Xi;Go)J J 

for some decreasing function 6{tq). Hence, it is possible to choose a small 
enough tq < eo, such that 

supp/„(G) -p/„(Go) 

< sup{n{A) log |Si r ^ + p„(G)} + sup ^ log | ] 

T2 Ta i=i [f[Xi, Go) J 

<8Mro(logro)2n-^5(eo)n 
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The first term of tlie tliird line above is from the assessment of n{A), C3 on 
PniG)- Note also that PniGo) = o{n). Therefore, almost surely, 

r2 

Step 3. From the above two steps, we know that Gn G ^3 with probability 1. 
At the same time, when G G T3, we have Pn{G) = o(l). By the definition of 
the maximum penalized likelihood estimator, we have 

UGn) - UGo) > Pn{G) - Pn{Go) = o(l). (4) 

Since the parameter space is now completely regular, an estimator with 
property (jl]) is easily shown to be consistent by the classical technique [23] 
even with a penalty of size o{n). □ 

Proof of Theorem 3: When po < p < oo, we cannot expect that every 
part of G converges to that of Gq. Instead, we measure their difference as two 
distributions. Let 

H{G,Go)= I |G'(A) -Go(A)|exp{-|A|}t;A 

where 

A = (/il,/i2, ••■,Ai<i,Crii, 0-12,0-22, ■■■,crdd) E TZ'^ X A, 

d d i 

j=i i=i j=i 

and ^ is a subset of 7^'^x('^+i)/2 containing all eligible combinations ofd x (c? + l)/2 
real numbers which form a symmetric positive definite matrix. It is well known 
that A is an open connected subset of 7^'^x('^+i)/2 is regular enough al- 
though it may not be easy to visualize its shape. 

It can be shown that H{Gn, Gq) — > implies Gn Gq in distribution. An 
estimator Gn is strongly consistent if H{Gn, Gq) almost surely. 

Again, for the sake of clarity, we consider only the special case with p = 
2,pq = 1, that is, to fit a non-mixture multivariate normal model with a two- 
component multivariate normal mixture model. The extension of our proof to 
general situations is straightforward and the major hurdle is merely a compli- 
cated presentation. Most intermediate conclusions in the proof of consistency 
of the PMLE when p = pq = 2 are still applicable; some need minor changes. 
We use many of these results and notations to establish a brief proof. 

For an arbitrarily small positive number S, define Hi^S) = {G : G 6 F, H{G, Gq) > 
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S}. That is, 7i((5) contains all mixing distributions with up to p components 
that are at least 6 > distance from the true mixing distribution Gq. 



Since Gq ^ n{6), we have E[\og{g{X; G)/ f{X; Gq)}] < for any G e n{6). 
Thus, (EJ remains valid after being slightly revised as follows: 

sup n"if]log{^?(X,;G)//(X,;Go)} ^ -v{r) 
G€H(5)nr2 i=i 

for some positive 77 (r) depending on Because of this, the derivations in the 
proof of Theorem 1 still apply after is replaced by 'H{5) fl {k = 1,2). 
That is, with proper choice of eo and tq, we similarly get ^y>^Y>G£H{&)nTk V^niG) — 
pin{Go) —00 for k = 1,2. 

With what we have proved, it is seen that the penalized maximum likelihood 
estimator of G, Gn, must almost surely belong to 7i'^(5) U T^, where TC^^S) 
is the complement of T-C{6). Since 6 is arbitrarily small, Gn € implies 
H{Gn, Go) 0. On the other hand, Gn ^ Ts is equivalent to putting a positive 
lower bound on the component variances, which also implies H{Gn, Go) 
by [lO]. That is, consistency of the PMLE is also true when p = 2 but po = 1. 

A generalization of the above derivation leads to the conclusion of Theorem 
3. 
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Table 1 

Number of Degeneracies 



Mean.Var.Config 


1 


2 


3 


4 


5 


6 




2-componcnt bivariatc normal mixture 


near 





11 


19 


5 


40 


8 


moderate 


1911 


3256 


441 


6 


2523 


157 


distant 


4997 


4998 


4966 


4782 


4998 


4943 




3-componcnt bivariatc normal mixture 


straight 


3049 


5058 


4947 


1998 


2306 


2491 


acute 


2888 


4505 


4812 


4052 


4057 


4561 


obtuse 


3253 


4980 


4983 


2885 


3022 


3511 




2-componcnt trivariatc normal mixture 


near 


1 


4872 


5003 


4866 


4961 


1466 


moderate 


4011 


5000 


5001 


5000 


5000 


4900 


distant 


5000 


5000 


5000 


5000 


5000 


5000 




3-component trivariatc normal mixture 


straight 


5009 


5010 


5002 


5002 


5000 


5000 


acute 


5006 


5034 


5000 


5002 


5000 


5000 


obtuse 


5009 


5038 


5002 


5004 


5000 


5001 
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Table 2 

Bias (std) under 2-component bivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 




Model 1. 1.1, component 1 


7r = 0.3 
Ml = 

/"2 = -1 

an = 1 

/Too — 1 


-0.03 (0.11) 

-0.16 (0.53) 

-0 14 (0 dl ) 
-0.01 (0.39) 
-0.03 (0.71) 


-0.02 (0.11) 

-0.16 (0.53) 
0.72 (1.17) 

-0.14 (0.40) 
0.00 (0.38) 

-0.03 (0.70) 


-0.01 (0.10) 

-0.13 (0.50) 
0.71 (1.14) 

-0.13 (0.37) 
0.00 (0.34) 

-0.01 (0.64) 




Mode 


1. 1.1, component 2 


772 = 0.7 

Ml = 
M2 = 1 

an = 1 

(Ti o = D 

0"22 = 1 


0.03 (0.11) 
0.04 (0.19) 
-0 "iQ (0 AT) 

-0.07 (0.18) 
0.00 (0.19) 
0.33 (0.44) 


0.02 (0.11) 
0.04 (0.19) 
-0.39 (0.47) 
-0.07 (0.18) 
0.00 (0.19) 
0.33 (0.44) 


0.01 (0.10) 
0.04 (0.19) 
-0.37 (0.48) 
-0.07 (0.18) 
0.00 (0.19) 
0.30 (0.43) 




Mode 


1.2.4, component 1 


TTl = 0.3 

Mi = 
M2 = -3 

(Til = 5 

cri2 = 
<722 = 1 


0.00 (0.03) 
-0.02 (0.28) 
-0.01 (0.13) 
-0.04 (0.93) 
0.00 (0.30) 
-0.02 (0.19) 


0.00 (0.03) 
-0.02 (0.28) 
-0.01 (0.13) 
-0.04 (0.93) 
0.00 (0.30) 
-0.02 (0.19) 


0.00 (0.03) 
-0.02 (0.28) 
-0.01 (0.13) 
-0.04 (0.93) 
0.00 (0.30) 
0.00 (0.19) 




Mode 


1.2.4, component 2 


772 = 0.7 

Ml = 

M2 = 3 

<Tll = 1 

ai2 = 

<7'22 = 1 


0.00 (0.03) 
0.00 (0.09) 
0.00 (0.09) 
-0.01 (0.12) 
0.00 (0.08) 
0.00 (0.12) 


0.00 (0.03) 
0.00 (0.09) 
0.00 (0.09) 
-0.01 (0.12) 
0.00 (0.08) 
0.00 (0.12) 


0.00 (0.03) 
0.00 (0.09) 
0.00 (0.09) 
-0.01 (0.12) 
0.00 (0.08) 
0.00 (0.12) 
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Table 3 

Bias (std) under 3-component bivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 




Model II. 1.1, component 1 


TT = 


0.15 


-0.10 


(0.06) 


-0.08 


(0.07) 


-0.04 


(0.07) 


/^l = 


= 


0.69 


(1.15) 


0.58 


(1.28) 


0.25 


(1.01) 




= -2 


1.17 


(2.48) 


1.15 


(2.32) 


1.24 


(1.94) 


o-ii 


= 1 


-0.33 


(0.91) 


-0.46 


(0.60) 


-0.33 


(0.52) 




= 


-0.04 


(0.54) 


-0.02 


(0.46) 


0.02 


(0.48) 


<722 


= 1 


-0.22 


(1.16) 


-0.22 


(1.01) 


0.12 


(1.01) 






Model II. 1.1, 


component 2 




7r2 = 


= 0.35 


-0.02 


(0.10) 


-0.02 


(0.10) 


-0.03 


(0.08) 


Ml -- 


= 


-0.10 


(0.39) 


-0.08 


(0.38) 


-0.06 


(0.39) 


1^2 -- 


= 


0.61 


(1.54) 


0.63 


(1.53) 


0.56 


(1.44) 


cru 


= 1 


-0.13 


(0.29) 


-0.13 


(0.30) 


-0.14 


(0.31) 




= 


0.02 


(0.32) 


0.01 


(0.33) 


0.02 


(0.34) 


0'22 


= 1 


0.24 


(0.70) 


0.20 


(0.71) 


0.22 


(0.69) 






Model II. 1.1, 


component 3 






= 0.5 


0.11 


(0.11) 


0.10 


(0.12) 


0.06 


(0.10) 


Ail -- 


= 


0.02 


(0.20) 


0.01 


(0.21) 


0.01 


(0.24) 


M2 = 


= 2 


-1.23 


(0.90) 


-1.16 


(0.89) 


-1.02 


(0.89) 


0-11 


= 1 


-0.08 


(0.16) 


-0.08 


(0.17) 


-0.10 


(0.19) 


fl2 


= 


0.03 


(0.26) 


0.03 


(0.27) 


0.00 


(0.28) 


0'22 


= 1 


0.86 


(0.68) 


0.81 


(0.70) 


0.65 


(0.67) 
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Table 4 

Bias (std) under 3-component bivariate normal mixture models. 





Model II. 2. 4, component 1 


TTl = 


= 0.15 


0.00 (0.04) 


0.01 (0.04) 


0.01 (0.03) 


Ail -- 


= 


0.23 (0.86) 


0.18 (0.74) 


0.19 (0.72) 


IJ'2 -- 


= -2 


0.12 (0.83) 


0.11 (0.63) 


0.11 (0.54) 


0-11 


= 1 


0.07 (0.69) 


0.06 (0.60) 


0.10 (0.59) 




= 


-0.05 (0.54) 


-0.03 (0.40) 


-0.04 (0.38) 




= 1 


0.17 (0.99) 


0.18 (0.95) 


0.20 (0.90) 




Model II. 2. 4, component 2 


vr2 = 


= 0.35 


-0.01 (0.05) 


-0.01 (0.05) 


-0.01 (0.05) 


Ail = 


= 3 


-0.43 (1.12) 


-0.40 (1.09) 


-0.38 (1.08) 


/^2 = 


= 


0.15 (0.82) 


0.14 (0.80) 


0.13 (0.79) 


o-ii 


= 1 


0.37 (1.12) 


0.33 (1.05) 


0.31 (1.03) 


0'12 


= 


-0.01 (0.35) 


-0.02 (0.34) 


-0.03 (0.37) 


f22 


= 5 


-0.69 (1.60) 


-0.65 (1.57) 


-0.62 (1.55) 




Model II.2.4, component 3 


7r3 = 


= 0.5 


0.00 (0.05) 


0.00 (0.05) 


0.00 (0.05) 


Ml = 


= 


0.33 (0.88) 


0.31 (0.88) 


0.30 (0.87) 




= 2 


-0.19 (0.57) 


-0.17 (0.53) 


-0.16 (0.51) 




= 5 


-0.38 (1.31) 


-0.36 (1.31) 


-0.36 (1.30) 


0'12 


= 


0.00 (0.28) 


-0.01 (0.26) 


-0.01 (0.27) 


0'22 


= 1 


0.37 (1.15) 


0.34 (1.11) 


0.33 (1.08) 
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Table 5 

Bias (std) under 2-component trivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 






Model III.l.l, 


component 1 




TTl = 


= 0.3 


n r\r\ 


(0.15) 


-0.08 


(0.15) 


-0.05 


0.14j 




= 


n OS 

-U.Zo 


(0.61) 


-0.26 


(0.58) 


-0.17 


U.Oi ) 


M2 -- 


= 


-0.15 


(0.58) 


-0.14 


(0.57) 


-0.09 


(0.52) 


^3 = 


= -1 


0.52 


(0.09) 


0.54 


(0.11) 


0.61 


(0.09) 


<7ll 


= 1 


-0.12 


(0.47) 


-0.11 


(0.46) 


-0.11 


(0.36) 


Cl2 


= 


-0.01 


(0.38) 


0.00 


(0.35) 


0.02 


(0.27) 


Cl3 


= 


-0.10 


(0.48) 


-0.10 


(0.47) 


-0.07 


(0.37) 


^■22 


= 1 


-0.09 


(0.56) 


-0.11 


(0.47) 


-0.13 


(0.36) 


0"23 


= 


-0.04 


(0.49) 


-0.02 


(0.47) 


-0.01 


(0.37) 


(^33 


= 1 


0.22 


(0.91) 


0.18 


(0.83) 


0.12 


(0.66) 






Model III.l.l, 


component 2 




TT2 - 


= 0.7 


0.09 


(0.15) 


0.08 


(0.15) 


0.05 


(0.14) 




= 


0.01 


(0.15) 


0.01 


(0.15) 


0.01 


(0.16) 


M2 = 


= 


0.02 


(0.15) 


0.02 


(0.15) 


0.02 


(0.17) 


M3 -- 


= 1 


-0.45 


(0.41) 


-0.44 


(0.41) 


-0.42 


(0.44) 


0-11 


= 1 


-0.05 


(0.13) 


-0.05 


(0.13) 


-0.05 


(0.14) 


cri2 


= 


0.00 


(0.10) 


0.00 


(0.10) 


0.00 


(0.10) 


<7l3 


= 


-0.02 


(0.13) 


-0.02 


(0.13) 


-0.02 


(0.14) 


<7'22 


= 1 


0.03 


(0.13) 


-0.03 


(0.13) 


-0.04 


(0.14) 


(723 


= 


0.01 


(0.14) 


0.01 


(0.14) 


0.01 


(0.15) 


<733 


= 1 


0.44 


(0.38) 


0.43 


(0.38) 


0.39 


(0.39) 
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Table 6 

Bias (std) under 2-component trivariate normal mixture models. 





Model III. 2.4, component 1 


TTl = 


= 0.3 


0.00 (0.04) 


0.00 (0.04) 


0.00 (0.04) 


Ml = 


= 


0.01 (0.13) 


0.01 (0.13) 


0.01 (0.13) 


1^2 -- 


= 


0.01 (0.22) 


0.01 (0.22) 


0.01 (0.22) 


M3 = 


= -3 


-0.03 (0.52) 


-0.03 (0.52) 


-0.04 (0.52) 


0-11 


= 1 


-0.01 (0.17) 


-0.01 (0.17) 


-0.01 (0.17) 


Cri2 


= 


-0.01 (0.20) 


-0.01 (0.20) 


-0.01 (0.19) 


<7l3 


= 


0.03 (0.45) 


0.03 (0.45) 


0.03 (0.45) 


<722 


= 3 


-0.05 (0.49) 


-0.05 (0.49) 


-0.04 (0.49) 


0'23 


= 


0.00 (0.75) 


0.00 (0.75) 


0.01 (0.75) 


0"33 


= 10 


-0.36 (2.10) 


-0.36 (2.11) 


-0.38 (2.09) 




Model III. 2. 4, component 2 


7r2 = 


= 0.7 


0.00 (0.04) 


0.00 (0.04) 


0.00 (0.04) 


Ml = 


= 


0.00 (0.15) 


0.00 (0.15) 


0.00 (0.15) 


M2 = 


= 


-0.01 (0.19) 


-0.01 (0.19) 


-0.01 (0.19) 


M3 = 


= 3 


-0.01 (0.11) 


-0.01 (0.11) 


-0.01 (0.11) 


<7ll 


= 4.87 


-0.03 (0.47) 


-0.03 (0.48) 


-0.03 (0.47) 


Cl2 


= -3.23 


0.03 (0.49) 


0.03 (0.49) 


0.03 (0.48) 


0-13 


= -0.5 


0.01 (0.23) 


0.01 (0.23) 


0.01 (0.23) 


0"22 


= 7.2 


-0.07 (0.71) 


-0.07 (0.72) 


-0.07 (0.71) 


<723 


= 2.16 


-0.02 (0.30) 


-0.02 (0.30) 


-0.02 (0.30) 


0"33 


= 1.94 


-0.01 (0.22) 


-0.01 (0.22) 


0.00 (0.22) 
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Table 7 

Bias (std) under 2-component trivariate normal mixture models. 





Model III. 3.6, component 1 


TTl = 


= 0.3 


0.00 (0.03) 


0.00 (0.03) 


0.00 (0.03) 


Ml = 


= 


0.00 (0.10) 


0.00 (0.10) 


0.00 (0.10) 


1^2 -- 


= 


0.01 (0.19) 


0.01 (0.19) 


0.00 (0.19) 


M3 = 


= -5 


0.01 (0.37) 


0.01 (0.37) 


0.01 (0.37) 


0-11 


= 1 


-0.01 (0.15) 


-0.01 (0.15) 


-0.01 (0.15) 


Cri2 


= 


0.01 (0.18) 


0.01 (0.18) 


0.01 (0.18) 


<7l3 


= 


0.02 (0.36) 


0.02 (0.36) 


0.02 (0.36) 


<722 


= 3 


-0.05 (0.45) 


-0.05 (0.45) 


-0.04 (0.45) 


0'23 


= 


-0.02 (0.64) 


-0.02 (0.64) 


-0.02 (0.64) 


0"33 


= 10 


-0.06 (1.81) 


-0.06 (1.81) 


-0.06 (1.80) 




Model III. 3. 6, component 2 


7r2 = 


= 0.7 


0.00 (0.03) 


0.00 (0.03) 


0.00 (0.03) 


Ml = 


= 


0.00 (0.15) 


0.00 (0.15) 


0.00 (0.15) 


M2 = 


= 


0.00 (0.19) 


0.00 (0.19) 


0.00 (0.19) 


M3 = 


= 5 


0.00 (0.10) 


0.00 (0.10) 


0.00 (0.10) 


<7ll 


= 4.87 


-0.05 (0.46) 


-0.05 (0.46) 


-0.05 (0.46) 


Cl2 


= 3.23 


-0.03 (0.46) 


-0.03 (0.46) 


-0.03 (0.46) 


0-13 


= -0.5 


0.00 (0.22) 


0.00 (0.22) 


0.00 (0.22) 


cr22 


= 7.2 


-0.02 (0.70) 


-0.02 (0.70) 


-0.03 (0.70) 


<723 


= -2.16 


-0.01 (0.29) 


-0.01 (0.29) 


-0.01 (0.29) 


0"33 


= 1.94 


-0.01 (0.20) 


-0.01 (0.20) 


0.00 (0.20) 
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Table 8 

Bias (std) under 3-component trivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 




Model IV. 1.1, component 1 


TTl = 


= 0.15 


-0.05 (0.07) 


-0.06 (0.07) 


-0.01 (0.07) 


Ml = 


= 


0.10 (0.64) 


0.28 (0.97) 


0.12 (0.69) 


= 


= 


-0.08 (0.64) 


0.11 (0.97) 


-0.04 (0.65) 


= 


= -2 


3.07 (2.16) 


2.65 (2.17) 


2.16 (1.89) 


0"11 


= 1 


-0.05 (0.73) 


-0.25 (0.63) 


-0.19 (0.47) 


O'U 


= 


0.07 (0.50) 


0.05 (0.40) 


0.04 (0.35) 




= 


-0.01 (0.58) 


0.00 (0.51) 


0.00 (0.48) 


(^22 


= 1 


-0.04 (0.74) 


-0.23 (0.63) 


-0.16 (0.47) 




= 


0.03 (0.51) 


0.03 (0.47) 


0.04 (0.43) 


0-33 


= 1 


-0.01 (1.1(3) 


0.01 (1.19) 


0.31 (1.05) 




Model IV. 1.1, component 2 


V2 -- 


= 0.35 


-0.05 (0.09) 


-0.07 (0.11) 


-0.05 (0.09) 


IJ'l 


= 


-0.05 (0.33) 


-0.10 (0.45) 


-0.02 (0.37) 


Ai2 = 


= 


0.04 (0.33) 


-0.02 (0.43) 


0.01 (0.34) 




= 


0.00 (1.47) 


0.02 (1.52) 


0.26 (1.42) 


0-11 


= 1 


-0.09 (0.26) 


-0.12 (0.32) 


-0.11 (0.29) 


(712 


= 


0.02 (0.20) 


0.01 (0.23) 


0.02 (0.21) 


CIS 


= 


-0.05 (0.32) 


-0.05 (0.41) 


-0.03 (0.35) 


0'22 


= 1 


-0.09 (0.28) 


-0.11 (0.30) 


-0.11 (0.28) 


0"23 


= 


0.02 (0.33) 


-0.01 (0.37) 


0.01 (0.33) 


0"33 


= 1 


0.46 (0.83) 


0.48 (0.93) 


0.46 (0.84) 




Model IV. 1.1, component 3 


7r3 = 


= 0.5 


0.10 (0.12) 


0.13 (0.15) 


0.06 (0.12) 


IJ'l 


= 


0.01 (0.19) 


0.00 (0.18) 


0.00 (0.21) 


1^2 -- 


= 


-0.01 (0.18) 


-0.01 (0.17) 


0.00 (0.21) 


Ai3 = 


= 2 


-0.96 (0.81) 


-1.00 (0.79) 


-0.97 (0.86) 


cm 


= 1 


-0.07 (0.17) 


-0.07 (0.17) 


-0.08 (0.19) 


0"12 


= 


0.01 (0.12) 


0.00 (0.11) 


0.01 (0.13) 


fl3 


= 


-0.04 (0.22) 


-0.04 (0.22) 


-0.04 (0.24) 


0'22 


= 1 


-0.06 (0.16) 


-0.06 (0.16) 


-0.07 (0.18) 


0'23 


= 


0.04 (0.22) 


0.03 (0.22) 


0.03 (0.25) 




= 1 


0.76 (0.72) 


0.88 (0.77) 


0.75 (0.76) 
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Table 9 

Bias (std) under 3-component trivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 




Model IV. 2. 4, component 1 


TTl = 


= 0.15 


0.00 (0.05) 


0.00 (0.04) 


0.01 (0.04) 


Ml = 


= 


0.04 (0.43) 


0.04 (0.37) 


0.02 (0.29) 


M2 = 


= 


0.20 (0.96) 


0.20 (0.90) 


0.24 (0.88) 


M3 = 


= -2 


0.19 (0.86) 


0.17 (0.80) 


0.20 (0.80) 


<7ll 


= 1 


0.05 (0.63) 


0.02 (0.52) 


0.01 (0.38) 


0'12 


= 


-0.03 (0.54) 


-0.01 (0.41) 


-0.01 (0.34) 


Cl3 


= 


0.04 (0.79) 


0.01 (0.58) 


0.01 (0.35) 


f22 


= 1 


0.18 (1.06) 


0.13 (0.81) 


0.18 (0.73) 


(723 


= 


-0.15 (1.09) 


-0.10 (0.65) 


-0.09 (0.62) 




= 1 


0.(35 (2.52) 


0.53 (2.17) 


0.(38 (2.31) 




Model IV. 2. 4, component 2 


vr2 = 


= 0.35 


-0.01 (0.06) 


-0.01 (0.06) 


-0.02 (0.06) 


Ml = 


= 


0.01 (0.19) 


0.01 (0.19) 


0.01 (0.18) 


M2 = 


= 3 


-0.51 (1.25) 


-0.46 (1.21) 


-0.34 (1.13) 


M3 = 


= 


0.24 (0.94) 


0.21 (0.91) 


0.13 (0.86) 


0-11 


= 1 


0.56 (1.54) 


0.50 (1.47) 


0.35 (1.27) 


<7l2 


= 


-0.49 (1.32) 


-0.44 (1.26) 


-0.32 (1.10) 


fl3 


= 


0.09 (0.42) 


0.08 (0.42) 


0.05 (0.41) 


cr22 


= 3 


0.48 (1.78) 


0.41 (1.71) 


0.20 (1.53) 


cr23 


= 


-0.33 (0.98) 


-0.30 (0.96) 


-0.25 (0.88) 


(^33 


= 10 


-1.40 (3.55) 


-1.26 (3.45) 


-1.03 (3.31) 




Model IV.2.4, component 3 


TTS = 


= 0.5 


0.01 (0.05) 


0.01 (0.05) 


0.00 (0.05) 


Ml = 


= 


-0.02 (0.18) 


-0.02 (0.18) 


-0.01 (0.19) 


M2 = 


= 


0.37 (0.87) 


0.34 (0.86) 


0.27 (0.79) 


M3 = 


= 2 


-0.28 (0.72) 


-0.25 (0.68) 


-0.17 (0.58) 


0-11 


= 4.87 


-0.57 (1.42) 


-0.51 (1.36) 


-0.39 (1.22) 


(712 


= -3.23 


0.45 (1.24) 


0.41 (1.20) 


0.30 (1.07) 


<7l3 


= 0.5 


-0.07 (0.33) 


-0.06 (0.33) 


-0.04 (0.32) 


0'22 


= 7.2 


-0.46 (1.48) 


-0.42 (1.46) 


-0.33 (1.38) 


Cr23 


= -2.16 


0.31 (0.95) 


0.27 (0.89) 


0.18 (0.77) 


0"33 


= 1.94 


0.88 (2.23) 


0.79 (2.16) 


0.58 (1.90) 
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Table 10 

Bias (std) under 3-component trivariate normal mixture models. 





MLE 


PMLEl 


PMLE2 




Model IV. 3. 6, component 1 


TTl = 


= 0.15 


0.00 (0.05) 


0.00 (0.05) 


0.00 (0.05) 


Ml = 


= 


0.05 (0.41) 


0.05 (0.41) 


0.05 (0.40) 


M2 = 


= 


-0.01 (0.64) 


-0.01 (0.64) 


-0.01 (0.61) 


M3 = 


= -2 


-0.21 (1.23) 


-0.21 (1.23) 


-0.23 (1.20) 


<7ll 


= 1 


0.28 (1.24) 


0.28 (1.24) 


0.24 (1.12) 


0'12 


= 


-0.19 (1.16) 


-0.19 (1.16) 


-0.15 (1.05) 


Cl3 


= 


0.14 (1.04) 


0.14 (1.03) 


0.13 (0.99) 


f22 


= 3 


0.21 (1.48) 


0.21 (1.48) 


0.18 (1.40) 


(723 


= 


-0.42 (1.54) 


-0.42 (1.54) 


-0.39 (1.50) 




= 10 


-1.37 (3-73) 


-1.37 (3.73) 


-1.3i (3.(34) 




Model IV. 3. 6, component 2 


vr2 = 


= 0.35 


-0.01 (0.06) 


-0.01 (0.06) 


-0.01 (0.06) 


Ml = 


= 


-0.01 (0.33) 


-0.01 (0.33) 


0.00 (0.32) 


M2 = 


= 3 


-0.20 (0.61) 


-0.2 (0.61) 


-0.19 (0.60) 


M3 = 


= 


0.25 (0.96) 


0.25 (0.96) 


0.26 (0.94) 


<7ll 


= 4.87 


-0.15 (1.18) 


-0.15 (1.18) 


-0.13 (1.14) 


<7l2 


= -3.2 


1.23 (2.89) 


1.23 (2.89) 


1.2 (2.87) 


Cl3 


= 0.5 


-0.16 (0.62) 


-0.16 (0.62) 


-0.15 (0.62) 


0'22 


= 7.2 


-0.24 (1.56) 


-0.24 (1.56) 


-0.21 (1.52) 


0"23 


= -2.16 


0.21 (0.77) 


0.21 (0.77) 


0.19 (0.73) 


Cr33 


= 1.94 


0.21 (1.61) 


0.21 (1.61) 


0.18 (1.52) 




Model IV.3.6, component 3 


TTS = 


= 0.5 


0.02 (0.07) 


0.02 (0.07) 


0.02 (0.07) 


Ml = 


= 


-0.02 (0.22) 


-0.02 (0.22) 


-0.02 (0.22) 


M2 = 


= 


0.16 (0.43) 


0.17 (0.43) 


0.16 (0.43) 


M3 = 


= 2 


-0.33 (0.68) 


-0.33 (0.68) 


-0.32 (0.68) 


0-11 


= 4.87 


-0.18 (0.66) 


-0.18 (0.66) 


-0.17 (0.65) 


^■12 


= 3.23 


-1.06 (2.14) 


-1.06 (2.15) 


-1.04 (2.15) 


<7l3 


= -0.5 


0.17 (0.47) 


0.17 (0.47) 


0.16 (0.47) 


0'22 


= 7.2 


-0.21 (0.97) 


-0.21 (0.98) 


-0.20 (0.98) 


Cr23 


= -2.16 


0.03 (0.45) 


0.03 (0.45) 


0.03 (0.46) 


0"33 


= 1.94 


0.03 (0.39) 


0.03 (0.38) 


0.03 (0.38) 
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